24  Z-test

A z-test is a statistical test used to determine whether there is a significant difference between a sample statistic and a population parameter, under the assumption that the population variance is known.

24.1 Z-test: Foundation in Standardization

  • population variance is known: The Z-test is used for hypothesis testing when the population variance is known and the sample size is large (usually n > 30). These conditions allow for a simpler introduction to hypothesis testing because the standard normal distribution (Z-distribution) is used.

  • Standard Normal Distribution: The Z-test introduces the concept of the standard normal distribution, a critical foundational concept in statistics. Understanding how to standardize a score using the Z-distribution is fundamental and applies to many areas in statistics.

  • Large Sample Assumption: The Z-test is applicable in scenarios where the sample size is large enough to approximate the sampling distribution of the mean as normally distributed, due to the Central Limit Theorem. This concept is easier for beginners to grasp before moving on to situations where the sample size is small, and the population variance is unknown.

24.2 Z-test Formula

\[Zscore = \frac{\bar{x} - \mu}{\sigma / \sqrt{n}}\]

In the formula for the z-test, the variables represent the following components:

  1. \(\bar{x}\) (Sample Mean): This is the mean (average) of the sample data. It represents the average value of the data points in the sample you’re testing.

  2. \(\mu\) (Population Mean): This is the mean (average) of the entire population or the mean of the distribution under the null hypothesis. It’s the value you compare the sample mean against to determine if there is a significant difference.

  3. \(\sigma\) (Population Standard Deviation): This is the standard deviation of the entire population. It measures the dispersion or variability of the population data points around the population mean (\(\mu\)). In the context of a z-test, it’s assumed to be known.

  4. \(n\) (Sample Size): This is the number of observations or data points in the sample. It’s used to calculate the standard error of the mean.

  5. \(z\) (Z-score): The result of the z-test formula. The z-score is a statistical measurement that describes a value’s relationship to the mean of a group of values, measured in terms of standard deviations from the mean. In the context of hypothesis testing, the z-score represents how many standard deviations the sample mean is from the population mean.

  • The numerator (\(\bar{x} - \mu\)) represents the difference between the sample mean and the population mean. This difference shows how far the sample mean deviates from the population mean, which you’re testing for significance.

  • The denominator (\(\sigma / \sqrt{n}\)) is the standard error of the mean, which adjusts the population standard deviation (\(\sigma\)) for the size of the sample (\(n\)). This standard error measures how much the sample mean (\(\bar{x}\)) is expected to vary from one sample to another, purely by chance.

24.3 Z-test Example problem

Let’s consider a research question related to the average weight of adults in a particular city. Suppose previous studies indicate that the average weight of adults in this city is 75 kgs. We want to test if a new sample of adults from a specific neighborhood has a significantly different average weight, suggesting the neighborhood might have factors influencing weight

Is the average weight of adults in this specific neighborhood different from the general city average of 75 kg?
given that the population standard deviation is known to be 10 kg.

24.3.1 Hypothesis

  • Null Hypothesis (\(H_0\)): The average weight of adults in the neighborhood is 75 kgs (\(\mu = 75 kgs\)).
  • Alternative Hypothesis (\(H_a\)): The average weight of adults in the neighborhood is not 75 kgs (\(\mu \neq 75 kgs\)).

24.3.2 Data Collection

Suppose we collect a sample of 31 adults from the neighborhood and measure their weights. The sample provides the following statistics:

  • Sample mean weight (\(\bar{x}\)) = 75 kgs
  • Population standard deviation (\(\sigma\)) is known to be 10 (from previous extensive studies)
  • Sample size (\(n\)) = 31

Given the sample values

72, 75, 71, 74, 78, 79, 72, 73, 76, 77, 70, 79, 71, 74, 78, 72, 75, 71, 74, 78, 79, 72, 73, 76, 77, 70, 79, 71, 74, 78, 72

Let’s calculate the sample mean, Z-statistic, p-value for the provided data set.

Calculate the Sample Mean (\(\bar{x}\)):

let’s first determine the sample size \(n\) and then calculate the sample mean \(\bar{x}\):

\[ \bar{x} = \frac{\text{Sum of all sample values}}{n} \]

\[\bar{x} = \frac{72 + 75 + 71 + 74 + 78 + 79 + 72 + 73 + 76 + 77 + 70 + 79 + 71 + 74 + 78}{15}\]

The sample mean (\(\bar{x}\)) is 74.52 kg

Calculate Z-Statistic:

Given:

  • \(\mu = 75\) kg (population mean),
  • \(\sigma = 10\) kg (population standard deviation),
  • \(n=31\) (sample size).

We will use the Z-test formula: \[ Z = \frac{\bar{x} - \mu}{\sigma / \sqrt{n}} \]

\[ Z = \frac{74.52 - 75}{10 / \sqrt{31}} \approx -0.269 \]

The calculated Z-statistic is approximately -0.269.

  • The Z-statistic tells us how many standard deviations the sample mean is from the population mean.
  • A Z-statistic of -0.269 indicates that the sample mean is only about 0.269 standard deviations below the population mean.

Calculate P-Value

To determine if this difference is statistically significant, we would compare the absolute value of the Z-statistic to a critical value from a Z-table, typically at a significance level of 0.05 (for a two-tailed test, this critical value is approximately ±1.96).

  • Since the absolute value of our Z-statistic (-0.269) is much less than 1.96, we fail to reject the null hypothesis. This means there is not enough statistical evidence to suggest that the average weight of the sample group is significantly different from the population mean of 75 kg.

24.4 Z-Test calculation using Excel:

Download the Excel file link here

24.5 Z Test calculation with R:

Code
sample_weights <- c(72, 75, 71, 74, 78, 79, 72, 73, 76, 77, 70, 79, 71, 74, 78, 72, 75, 71, 74, 78, 79, 72, 73, 76, 77, 70, 79, 71, 74, 78, 72)

alpha = 0.05

# Calculate the sample size
sample_size <- length(sample_weights)
sample_size
[1] 31
Code
# Calculate the sample mean
sample_mean <- mean(sample_weights)
sample_mean 
[1] 74.51613
Code
# Define population parameters
population_mean <- 75
population_sd <- 10

# Calculate the Z-statistic
z_statistic <- (sample_mean - population_mean) / (population_sd / sqrt(sample_size))
z_statistic
[1] -0.269408
Code
# Calculate the P-value for the two-tailed test
p_value <- 2 * (1 - pnorm(abs(z_statistic)))
p_value
[1] 0.7876158
Code
if (p_value < alpha) {
  cat("Reject null hypothesis\n")
} else {
  cat("Do not reject null hypothesis\n")
}
Do not reject null hypothesis

24.6 Z Test calculation with python:

Code
python
import numpy as np
from scipy.stats import norm

sample_weights = np.array([72, 75, 71, 74, 78, 79, 72, 73, 76, 77, 70, 79, 71, 74, 78, 72, 75, 71, 74, 78, 79, 72, 73, 76, 77, 70, 79, 71, 74, 78, 72])
alpha = 0.05

# Sample size
sample_size = len(sample_weights)
sample_size
31
Code
# Sample mean
sample_mean = np.mean(sample_weights)
sample_mean
74.51612903225806
Code
# Population parameters
population_mean = 75  
population_sd = 10  

# Z-statistic
z_statistic = (sample_mean - population_mean) / (population_sd / np.sqrt(sample_size))
z_statistic
-0.2694079530401626
Code
# P-value for the two-tailed test
p_value = 2 * norm.sf(np.abs(z_statistic))
p_value
0.7876157667022055
Code
if p_value < alpha:
    print("Reject null hypothesis")
else:
    print("Do not reject null hypothesis")
Do not reject null hypothesis

.